-
Notifications
You must be signed in to change notification settings - Fork 32
🐛Director-v2: properly close dask client when use is completed #7880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛Director-v2: properly close dask client when use is completed #7880
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7880 +/- ##
==========================================
+ Coverage 87.26% 87.88% +0.61%
==========================================
Files 1592 1428 -164
Lines 60884 59218 -1666
Branches 1213 612 -601
==========================================
- Hits 53131 52043 -1088
+ Misses 7411 6975 -436
+ Partials 342 200 -142
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
537a1c8 to
d2d799d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces reference counting for Dask clients to ensure they are closed when no longer in use, reducing misleading error logs and resource leaks.
Key changes:
- Added
refparameter toDaskClientsPool.acquireand implemented_client_refstracking along with a newrelease_client_refmethod. - Updated all
acquirecall sites in the scheduler and tests to pass a unique reference and release it when the pipeline finishes. - Added new unit tests for verifying reference-counting behavior and wrapped logging around client creation in
log_context.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| services/director-v2/tests/unit/with_dbs/test_utils_dask.py | Updated acquire calls in tests to include ref argument |
| services/director-v2/tests/unit/test_modules_dask_clients_pool.py | Updated tests to pass distinct refs and added test_dask_clients_pool_reference_counting |
| services/director-v2/src/simcore_service_director_v2/modules/dask_clients_pool.py | Implemented _client_refs, added release_client_ref, and modified acquire to require ref |
| services/director-v2/src/simcore_service_director_v2/modules/dask_client.py | Replaced manual info logs with log_context around client creation |
| services/director-v2/src/simcore_service_director_v2/modules/comp_scheduler/_scheduler_dask.py | Passed run_id into acquire calls and added _release_resources to free clients |
| services/director-v2/src/simcore_service_director_v2/modules/comp_scheduler/_scheduler_base.py | Declared abstract _release_resources and invoked it on pipeline completion |
| packages/service-library/src/servicelib/rabbitmq/rpc_interfaces/clusters_keeper/clusters.py | Defined a constant RPC method name with TypeAdapter.validate_python |
Comments suppressed due to low confidence (2)
services/director-v2/src/simcore_service_director_v2/modules/comp_scheduler/_scheduler_dask.py:59
- [nitpick] This constant is a template string, not a literal ref. Consider renaming it to
_DASK_CLIENT_RUN_REF_TEMPLATEto clarify its role and avoid confusion.
_DASK_CLIENT_RUN_REF: Final[str] = "{user_id}:{comp_run.run_id}"
services/director-v2/src/simcore_service_director_v2/modules/dask_clients_pool.py:121
- The indentation of this
with log_contextblock is inconsistent relative to the surrounding code. Align it with theasync withblock so that the log context cleanly wraps the client acquisition logic.
with log_context(
services/director-v2/src/simcore_service_director_v2/modules/dask_clients_pool.py
Outdated
Show resolved
Hide resolved
packages/service-library/src/servicelib/rabbitmq/rpc_interfaces/clusters_keeper/clusters.py
Show resolved
Hide resolved
94eb7db to
e5b3e39
Compare
|
@mergify queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at ddc3e74 |
e5b3e39 to
0ec191d
Compare
|



What do these changes do?
While testing for issues within the computational backend, many non critical logs were found about
the dask client were acquired once and cached in the director-v2 and never closed, leading to that error message (which is not a problem but adds to confusion).
This PR aims to reference count the current pipeline for that specific user/wallet and when that count is down to 0, properly closes the client.
Related issue/s
How to test
Dev-ops